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Abstract 

The NETTAB 2012 workshop, held in Como on November 14-16, 2012, was devoted to "Integrated Bio-Search", that 
is to technologies, methods, architectures, systems and applications for searching, retrieving, integrating and 
analyzing data, information, and knowledge with the aim of answering complex bio-medical-molecular questions, 
i.e. some of the most challenging issues in bioinformatics today. It brought together about 80 researchers working 
in the field of Bioinformatics, Computational Biology, Biology, Computer Science and Engineering. More than 
50 scientific contributions, including keynote and tutorial talks, oral communications, posters and software 
demonstrations, were presented at the workshop. This preface provides a brief overview of the workshop and 
shortly introduces the peer-reviewed manuscripts that were accepted for publication in this Supplement. 



NETTAB Workshops 

NETTAB Workshops are a series of International meet- 
ings on "Network Tools and Applications in Biology" 
held annually in Italy [1]. They are aimed at introducing 
participants to the most promising among those innova- 
tive Information and Communication Technologies 
(ICTs) that are being applied to the biomedical applica- 
tion domain. Workshops include many focused sessions 
which are devoted to tools, systems, applications and 
perspectives. Keynote lectures introduce the sessions' 
topics and are followed by presentations selected from 
among the submitted contributions after peer review by 
members of the Scientific Committee. Discussion is a 
key factor, both within sessions and in a special Panel 
Discussion. Tutorials and poster sessions usually com- 
plete the agenda of the NETTAB workshops. 

Each year, the workshop is focused on a different tech- 
nology or domain. Since 2001, many different topics, often 
related to data integration issues, were discussed, thus 
reflecting the actual evolution of ICT tools and platforms 
in the last decade [2]. These included, e.g., Standardization 
for data integration (Genoa, 2001), Multi-agent systems 
(Bologna, 2002), Scientific workflows (Naples, 2005), Grid 
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and Web Services (Santa Margherita di Pula, 2006), 
Semantic Web (Pisa, 2007), Collaborative research and 
development (Catania, 2009), Biological wikis (Naples, 
2010) and Clinical Bioinformatics (Pavia, 2010). 

The twelfth NETTAB Workshop: NETTAB 2012 

The NETTAB 2012 workshop, the twelfth in the series, 
was held in Como, Italy, on November 14-16, 2012. It was 
organized by Marco Masseroli, Politecnico di Milano, 
Milano, Paolo Romano, Cancer Comprehensive Center 
and University Hospital San Martino 1ST, Genova, and 
Frederique Lisacek, Swiss Institute of Bioinformatics, Gen- 
eva. Its rationale is based on the consideration that the 
data deluge of the current post-genomic era is providing 
scientists with potentially very valuable but often inaccessi- 
ble information. It is indeed difficult to find and extract 
from the high-throughput omics data those information 
that are most reliable, specific and most related to the 
biological or biomedical questions to be answered. Such 
questions are increasingly complex and they often simulta- 
neously regard many heterogeneous aspects of an organ- 
ism, tissue, or cell, and the role of their biomolecular 
entities. Several of these questions can be addressed only 
by comprehensively searching different types of data, 
which generally are distributed in many heterogeneous 
sources. Usually, scientists explore these data by using 
the individual search services and tools available on the 
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Internet and they then struggle to combine the essential 
information in order to answer their global questions. In 
this context, moreover, quality and consistency checking is 
a central issue that should be addressed. 

Searching and combining numerous open and linked 
data and algorithmic sources has the potential of 
reshaping the scenario of current bioinformatics applica- 
tions, going beyond the capabilities of conventional 
tools, Web services and existing search engines. Yet, it 
also presents new technological challenges. Solving data 
integration and automatic extraction problems requires 
new solutions, including the use of universal Uniform 
Resource Identifiers (URIs), efficient indexing, partial or 
approximate value matching, rank aggregation, continu- 
ous or push-based search, exploratory methods and con- 
text-aware paradigms, collaborative and social search; it 
also needs building new efficient information retrieval 
approaches, based on automation of workflows, that 
may contribute to new "good practices" in data search- 
ing, retrieval and integration, with the specific goal of 
ensuring quality of procedures, as well as their reprodu- 
cibility coupled with efficiency and efficacy. 

On these premises, then, the NETTAB 2012 workshop 
has been focused on "Integrated Bio-Search", which 
includes all aspects that relate to technologies, methods, 
architectures, systems and applications for searching, 
retrieving, integrating and analyzing data, information, 
knowledge, infrastructures, services and tools that are 
required to answer complex bio-medical-molecular 
questions. 

The Call for abstracts attracted 34 submissions for 
oral communications. All submissions underwent peer 
review by members of the Scientific Committee that 
selected 12 oral communications, seven short oral com- 
munications, and three technological communications 
from industry; 29 posters were also presented at the 
workshop. The Proceedings were published by the EMB- 
net.journal [3]. 

Three keynote talks were given. Erik Bongcam-Rudloff, 
from the Swedish University of Agricultural Sciences and 
the Uppsala University, gave a talk on "Integration and 
analysis of multi-type high-throughput data for biomole- 
cular knowledge discovery". "Semantics based biomedical 
knowledge search, integration and discovery" was the title 
of the lecture given by Barend Mons, Leiden University 
Medical Center and Netherlands Bioinformatics Center. 
Finally, Eric Neumann, PanGenX and Clinical Semantics 
Technologies, gave a talk on "Clinical and genomic data 
integration in support of biomedical research and clinical 
practice". 

Two appreciated tutorials were also given by Alexander 
Kel, GeneXplain GmbH and Institute of Chemical Biology 
and Fundamental Medicine SBRAS, on "Multi-scale data 
integration and virtual exploration from promoters, 



through networks to drug targets", and by Katy Wolsten- 
croft, University of Manchester, who spoke about "The 
Taverna Workbench: Integrating and analysing biological 
and clinical data with computerised workflows". It is note- 
worthy that the Web site of the workshop includes the 
video recording of almost all of oral presentations [4]. 

Selection of best papers 

Twenty nine (29) papers were submitted for publication in 
this BMC Bioinformatics Supplement after the conference. 
An Editorial Board was formed, including all members of 
the NETTAB 2012 Scientific Committee. Associated 
Editors were: 

• Francisco Azuaje, Centre de Recherche Public de la 
Sante, Luxembourg 

• Olivier Bodenreider, US National Library of Medicine, 
USA 

• Mario Cannataro, University of Catanzaro "Magna 
Graecia", Italy 

• Marie-Dominique Devignes, CNRS, University of 
Lorraine, Nancy, France 

• Christine Froidevaux, Universite Paris-Sud, France 

• Carole Goble, University of Manchester, United 
Kingdom 

• Nicolas Le Novere, European Bioinformatics Institute, 
United Kingdom 

• Ulf Leser, Humboldt University, Germany 

• Frederique Lisacek, Swiss Institute of Bioinfor- 
matics, Switzerland 

• Paolo Magni, University of Pavia, Italy 

• Roberto Marangoni, University of Pisa, Italy 

• Marco Masseroli, Politecnico di Milano, Italy 

• Paolo Missier, Newcastle University, United 
Kingdom 

• Heiko Muller, Italian Institute of Technology, Italy 

• Horacio Perez-Sanchez, University of Murcia, 
Spain 

• Paolo Romano IRCCS AOU San Martino 1ST, Italy 

• Patrick Ruch, University of Applied Sciences, 
Switzerland 

• Neil Sarkar, University of Vermont, USA 

Each Associate Editor managed the reviewing process 
for one or two papers, according to his/her expertise in 
workshop topics. Three international level referees were 
selected for each submission. Overall, 54 referees from 
11 different countries were involved in the selection of 
papers. A two step peer review procedure was adopted: 
some of the authors were invited to submit a revised ver- 
sion of their paper, according to the referees' comments, 
when it wasn't neither accepted nor rejected at the first 
step. The Associated Editors made a global assessment 
for papers assigned to each of them and provided the 
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final recommendation for each paper. At the end of this 
process, 14 papers were proposed and are now included 
in this Supplement, and one more paper was proposed 
for publication in another journal. 

A short presentation of selected papers 

Workshop topics included four main areas. The first 
area relates to data integration. It includes syntactic and 
semantic methods and algorithms for biological and 
clinical data and knowledge integration, information and 
knowledge retrieval, data and knowledge query, data, 
information and knowledge extraction, and data and 
knowledge mining. The second area refers to new and 
optimized technologies for data management. It includes 
federated databases, data warehouses, and triple stores. 
It also includes topics as biomedical terminologies and 
ontologies, systems' interoperability, natural language 
processing, and scientific workflow processing. Tools 
and platforms for molecular data management and sto- 
rage, deep sequencing analysis, omics data computing, 
search computing, decision support, and clinical bioin- 
formatics characterize the third topic area. The fourth 
area includes examples of applications of these methods, 
technologies and tools in different biomedical domains, 
such as biomedical knowledge assessment, integration, 
discovery and validation, drug design, diagnosis and 
prognosis support, and personalized medicine. 

Masseroli, Mons et al. present some of the challenges 
and trends for the integration, search and processing of 
biological information [5]. Starting from the need for 
adopting common data models and for community dri- 
ven, re-usable efforts, the role of large scale interna- 
tional research infrastructures and of public-private 
partnerships targeted to addressing the complex chal- 
lenges of data intensive science is stressed. Some crucial 
social aspects are also discussed, as well as an open 
business model for bioinformatics which could be able 
to reduce duplication of efforts. 

The paper by Masseroli, Picozzi et al. "Explorative 
search of distributed bio-data to answer complex biome- 
dical questions" [6] presents the Bio-SeCo system, a 
platform dedicated to answer complex biomedical ques- 
tions by combining different heterogeneous services and 
providing global, homogeneous results, thus facilitating 
navigation among distributed biomedical data and 
answering queries involving several kinds of data. 

The paper by Pio, Malerba et al. "Integrating micro- 
RNA target predictions for the discovery of gene regula- 
tory networks: a semi-supervised ensemble learning 
approach" [7] presents a machine learning based 
approach for the combination of different algorithms for 
the prediction of relationships between mRNA and 
miRNA, which is able to optimize the discovery of 
miRNA-mRNA regulatory networks. 



In the paper "ProphNet: A generic prioritization 
method through propagation of information" [8], Marti- 
nez, Cano et al. propose a novel network-based method 
for the prioritization of a set of entities that is able to 
integrate an arbitrary number of interrelated biological 
entities, thus overcoming current limitations of prioriti- 
zation tools. 

Cremaschi, Rovida et al. are the authors of "Correla- 
Genes: A new tool for the interpretation of the human 
transcriptomes" [9]. This paper presents a new approach 
and tool for mining public gene expression profiles from 
the Gene Expression Omnibus (GEO) system that couples 
association rules and J 2 test. This tool is also able to make 
a great number of GEO expression data sets searchable. 

In their paper "Reducing bias in RNA sequencing data: a 
novel approach to compute counts" [10], Finotello, 
Lavezzo et al. describe maxcounts, a novel approach for 
measuring exon expression levels from RNA-Seq data, 
defined as the maximum number of counts among the 
positions of an exon, that aims at a more accurate estima- 
tion of expression levels from RNA-Seq data. A compari- 
son with a standard approach, using three different data 
sets and considering several criteria, is also presented. 

The paper "AnnotateGenomicRegions: A Web applica- 
tion" [11] by Zammataro, De Molfetta et al. describes a 
simple, but fast and effective, Web application that 
accepts genomic regions as input, downloads genome 
annotations, both overlapping and neighbouring, from 
the Genome Browser, including RefSeq transcripts, 
EnsEMBL transcripts, all_mrna transcripts, CpG islands 
and promoter regions of transcripts, and makes them 
available through both a Web site and a Web API. Being 
available as a Web interface, AnnotateGenomicRegion is 
user-friendly and scales well with respect to the load. 

Campbell, Ranzinger et al. diagnose the causes of the 
slow development of glycobioinformatics and the difficul- 
ties encountered in defining adequate formats for repre- 
senting complex carbohydrates in their paper "Toolboxes 
for a standardised and systematic study of glycans" [12]. 
The paper strongly suggests the integration of glycomics 
in the -omics landscape to better understand biological 
processes and it highlights the necessary steps to achieve 
this goal. 

In the paper "A tool for mapping Single Nucleotide 
Polymorphisms using Graphics Processing Units" [13], 
Manconi, Orro et al. present a tool which maps a short 
sequence of SNP against a DNA sequence to find its phy- 
sical position in that sequence. The tool does not provide 
an original algorithm, but it leverages on three existing 
software applications. The integration of existing soft- 
ware to solve a concrete problem, however, is a valuable 
solution for many biological problems, able to avoid 
duplication of efforts and to exploit existing resources to 
their best. 
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The paper by Gonzalez-Beltran, Neumann et al. "The 
Risa R/Bioconductor package: integrative data analysis 
from experimental metadata and back again" [14] presents 
a simple, effective, long awaited package that is a crucial 
tool for bridging data curation and data analysis, a bottle- 
neck for research data management, with real world 
examples. 

In scientific workflows, there often arise some patterns, 
so called "anti-patterns", that can lead to over-complicated 
design and may compromise, share and reuse of work- 
flows. The paper "Distilling structure in Taverna scientific 
workflows: a refactoring approach" [15] by Cohen-Boula- 
kia, Chen et al. presents a method to detect and remove 
"anti-patterns" in workflows automatically. The paper for- 
mally introduces two anti-patterns and illustrates the 
application of the method on more than 1,500 workflows 
from two distinct domains. 

The paper "QTREDS: a Ruby on Rails-based platform 
for omics laboratories" [16] by Palla, Frau et al. describes a 
lightweight Laboratory Information Management System 
(LIMS) designed for the needs of a sequencing and geno- 
typing laboratory. The system includes various functional 
blocks, including samples and reagents management, 
workflow generation and an articulated user interface. 

In their paper "Guidelines for managing data and pro- 
cesses in bone and cartilage tissue engineering" [17], 
Viti, Scaglione et al. introduce a conceptual framework 
for bone/cartilage tissue engineering data. They present 
guidelines defining the minimum information necessary 
for describing an experimental study in this domain, as 
well as a devoted ontology, that is oriented both to cells 
and to chemical composition, morphology, and physical 
characterization of biomaterials involved in bone/carti- 
lage tissue engineering research. 

Text-mining applications for biomedical patents are 
relatively rare, although the size of patent collections is 
rapidly increasing. The paper "Development and tuning 
of an original search engine for patent libraries in med- 
icinal chemistry" [18] by Pasche, Gobeill et al. presents 
an advanced search and retrieval engine for patents cor- 
pora. It also reports the results of extensive tests made 
to evaluate the impact of different search strategies on 
the performance of the search engine when applied to 
the most frequent search tasks performed in medical 
chemistry. 
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